In [1]:
import swat
conn = swat.CAS(host, port, username, password)
In [2]:
conn.simple?
You can also use Python's help function.
In [3]:
help(conn.simple)
Let's start off with the summary action. We'll need some data, so we'll load some CSV from a local file. Then we'll run the action on it.
In [4]:
cars = conn.read_csv('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv')
out = cars.summary()
out
Out[4]:
The result object here is a CASResults object which is a subclass of a Python dictionary. In this case, we only have one key "Summary". The value for this key is a DataFrame. We can store the DataFrame in a variable so that it's easier to work with, then we can do any of the standard Pandas DataFrame operations on it. Here we are setting the first column as the index for the DataFrame so that we can do data selection easier later on.
In [5]:
df = out['Summary']
df.set_index(df.columns[0], inplace=True)
df
Out[5]:
Now that we have an index, we can use the loc property of the DataFrame to select rows based on index values as well as columns based on names.
In [6]:
df.loc[['MSRP', 'Invoice'], ['Min', 'Mean', 'Max']]
Out[6]:
In the previous example, we called the summary action directly. This gave us a CASResults object that contained a DataFrame with the result of the action. You can also use many of the Pandas DataFrame methods directly on the CASTable object so that, in many ways, they are interchangeable. One of the most common methods used on a Pandas DataFrame is the describe method. This includes statistics that would normally be gotten by running variations of the summary, distinct, topk, and percentile actions. This is all done for you and the output created is the same as what you would get from an actual Pandas DataFrame. The difference is that in the case of the CASTable version, you can handle much, much larger data sets.
In [7]:
cars.describe()
Out[7]:
Other examples of DataFrame methods that work on CASTable objects are min, max, std, etc. Each of these calls simple.summary in the background, so if you want to use more than one, you might be better off just calling the describe method once to get all of them.
In [8]:
cars.min()
Out[8]:
In [9]:
cars.max()
Out[9]:
In [10]:
cars.std()
Out[10]:
In [11]:
conn.close()
In [ ]: